Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation
نویسندگان
چکیده
Recurrent neural network(RNN) has been broadly applied to natural language processing(NLP) problems. This kind of neural network is designed for modeling sequential data and has been testified to be quite efficient in sequential tagging tasks. In this paper, we propose to use bi-directional RNN with long short-term memory(LSTM) units for Chinese word segmentation, which is a crucial preprocess task for modeling Chinese sentences and articles. Classical methods focus on designing and combining hand-craft features from context, whereas bi-directional LSTM network(BLSTM) does not need any prior knowledge or pre-designing, and it is expert in keeping the contextual information in both directions. Experiment result shows that our approach gets stateof-the-art performance in word segmentation on both traditional Chinese datasets and simplified Chinese datasets.
منابع مشابه
Is Local Window Essential for Neural Network Based Chinese Word Segmentation?
Neural network based Chinese Word Segmentation (CWS) approaches can bypass the burdensome feature engineering comparing with the conventional ones. All previous neural network based approaches rely on a local window in character sequence labelling process. It can hardly exploit the outer context and may preserve indifferent inner context. Moreover, the size of local window is a toilsome manual-...
متن کاملNCTU-NTUT at IJCNLP-2017 Task 2: Deep Phrase Embedding using bi-LSTMs for Valence-Arousal Ratings Prediction of Chinese Phrases
In this paper, a deep phrase embedding approach using bi-directional long shortterm memory (Bi-LSTM) neural networks is proposed to predict the valence-arousal ratings of Chinese phrases. It adopts a Chinese word segmentation frontend, a local order-aware word-, a global phrase-embedding representations and a deep regression neural network (DRNN) model. The performance of the proposed method wa...
متن کاملLong Short-Term Memory for Japanese Word Segmentation
This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated recurrent units (GRU). However, in contrast to Chinese, Japanese includes several character types, such as hiragana, katakana, and kanji, that produce orthographic ...
متن کاملNeural Joint Model for Transition-based Chinese Syntactic Analysis
We present neural network-based joint models for Chinese word segmentation, POS tagging and dependency parsing. Our models are the first neural approaches for fully joint Chinese analysis that is known to prevent the error propagation problem of pipeline models. Although word embeddings play a key role in dependency parsing, they cannot be applied directly to the joint task in the previous work...
متن کاملDependency-based Gated Recursive Neural Network for Chinese Word Segmentation
Recently, many neural network models have been applied to Chinese word segmentation. However, such models focus more on collecting local information while long distance dependencies are not well learned. To integrate local features with long distance dependencies, we propose a dependency-based gated recursive neural network. Local features are first collected by bi-directional long short term m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016